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1 Method and System for Determining- Object Pose from Images 

2 

3 The present invention relates to a method and system for 

4 determining object pose from images such as still 

5 photographs, films or the like. In particular, the 

6 present invention is designed to allow a user to obtain a 

7 detailed estimation of the pose of a body, particularly a 

8 human body, from real world images with unconstrained 

9 image. features. 
10 

11 In the case of the human body, the task of obtaining pose 

12 information is made difficult because of the large 

13 variation in human appearance. Sources of variation 

14 include the scale, viewpoint, surface texture, 

15 illumination, self-occlusion, object-occlusion, body 

16 structure and clothing shape. In order to deal with 

17 these many complicating factors, it is common, in the 

18 prior art, to use a high level hand built shape model in 

19 which points on this shape model are associated with 

20 image measurements. A score can be computed and a search 

21 performed to find the best solutions to allow the pose of 

22 the body to be determined. 
23 
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2 

1 A second approach identifies parts of the body and then 

2 assembles them into the best configuration. This approach 

3 does not model self-occlusion. Both approaches tend to 

4 rely on a fixed number of parts being parameterised. In 

5 addition, many human pose estimation methods use riqid 

6 geometric primitives such as cones and spheres to model 

7 body parts, 
8 

9 Furthermore, existing techniques identify the boundary 

10 between the foreground in which the body part is situated 

11 and the background containing the rest of the scene shown 

12 in the image, by the detection of the edges between these 

13 two features, 
14 

15 Where the pose of a body is to be tracked through a 

16 series of images on a frame by frame basis, localised 

17 sampling of the images is used in the full dimensional 
13 pose space. The approach usually requires manual 

19 initialisation and does not recover from significant 

20 tracking errors. 
21 

22 It is an object of the present invention to provide an 

23 improved method and system for identifying in an image 

4 

24 the relative positions of parts of a pre-defined object 

25 (object pose) and to use this identification to analyse 

26 images in a number of technological applications areas. 
27 

28 In accordance with a first aspect of the present 

29 invention there is provided a method of identifying an 

30 object or structured parts of an object in an image, the 

31 method comprising the steps of: 

32 creating a set of templates, the set containing a 

33 template for each of a number of predetermined object 
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1 parts and applying said template to an area of interest 

2 in an image where it is hypothesised that an object part 

3 is present ; 

4 analysing image pixels in the area of interest to 

5 determine the likelihood that it contains the object 

6 part; 

7 applying other templates from the set of templates to 

8 other areas of interest in the image to determine the 

9 probability that said area of interest belongs to a 

10 corresponding object part and arranging the templates in 

11 a configuration; 

12 calculating the likelihood that the configuration 

13 represents an object or structured parts of an object; 

14 and 

15 calculating other configurations and comparing said 

16 configurations to determine the configuration that is 

17 most likely to represent an object or structured part of 
13 an object. 

19 

20 Preferably, the probability that an area of interest 

21 contains an object part is calculated by calculating a 

22 transformation from the co-ordinates of a pixel in the 

23 area of interest to the template. 
24 

25 Preferably, the step of analysing the area of interest 

26 further comprises identifying the dissimilarity between 

27 foreground and background of the template. 
28 

29 Preferably, the step of analysing the area of interest 

30 further comprises calculating a likelihood ratio based on 

31 a determination of the dissimilarity between foreground 

32 and background features of a transformed template. 
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1 Preferably, the templates are applied by aligning their 

2 centres, orientations in 2D or 3D and scales to the area 

3 of interest on the image. 
4 

5 Preferably, the template is a probabilistic region mask 

6 in which values indicate a probability of finding a pixel 

7 corresponding to an object part. 

S 

9 Optionally, the probabilistic region mask is estimated by 

10 segmentation of training images. 
11 

12 Optionally, the mask is a binary mask. 
13 

14 Preferably, the image is an unconstrained scene. 
15 

16 Preferably, the step of calculating the likelihood that 

17 the configuration represents an object or a structured 

18 part of an object comprises calculating a likelihood 

19 ratio for each object part and calculating the product of 

20 said likelihood ratios. 
21 

22 Preferably, the step of calculating the likelihood that 

23 the configuration represents an object comprises 

24 determining the spatial relationship of object part 

25 templates. 
26 

27 Preferably, the step of determining the spatial 

28 relationship of the object part templates comprises 

29 analysing the configuration to identify common boundaries 

30 between pairs of object part templates. 
31 

32 Optionally, the step of determining the spatial 

33 relationship of the object part templates requires 
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identification of object parts having similar 
characteristics and defining these as a sub-set of the 
object part templates. 

Preferably, the step of calculating the likelihood that 
the configuration represents an object or structured part 
of an object comprises calculating a link value for 
object parts which are physically connected. 

Preferably, the step of comparing said configurations 
comprises iteratively combining the object parts and 
predicting larger configurations of body parts. 

Preferably, the object is a human or animal body. 

In accordance with a second aspect of the invention there 
is provided a system for identifying an object or 
structured parts of an object in an image, the system 
comprising: 

a set of templates, the set containing a template for 
each of a number of predetermined object parts 
applicable to an area of interest in an image where it is 
hypothesised that an object part is present; 
analysis means for determining the likelihood that the 
area of interest contains the object part; 
configuring means capable of arranging the applied 
templates in a configuration; 

calculating means to calculate the likelihood that the 
configuration represents an object or structured parts of 
an object for a plurality of configurations; and 
comparison means to compare configurations so as to 
determine the configuration that is most likely to 
represent an object or structured part of an object. 
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6 

Preferably, the system further comprises imaging means 
capable of providing an image for analysis. 

More preferably, the imaging means is a stills camera or 
a video camera. 

Preferably, the analysis means is provided with means for 
identifying the dissimilarity between foreground and 
background of the template. 

Preferably, the analysis means calculates the probability 
that an area of interest contains an object part by 
calculating a transformation from the co-ordinates of a 
pixel in the area of interest to the template. 

Preferably, the analysis means calculates a likelihood 
ratio based on a determination of the dissimilarity 
between foreground and background features of a 
transformed template. 

Preferably, the templates are applied by aligning their 
centres, orientations (in 2D or 3D) and scales to the 
area of interest on the image. 

Preferably, the template is a probabilistic region mask 
in which values indicate a probability of finding a pixel 
corresponding to an object part. 

Optionally, the probabilistic region mask is estimated by 
segmentation of training images'. 

Optionally, the mask is a binary mask. 
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29 

30 

31 
32 



Preferably, the likelihood that the configuration 
represents an object comprises determining the spatial 
relationship of object part templates. 



2 Preferably, the image is an unconstrained scene. 
3 

4 Preferably, the calculating means calculates a likelihood 

5 ratio for each object part and calculating the product of 

6 said likelihood ratios. 
7 

S 
9 

10 
11 

12 Preferably, the spatial relationship of the object part 

13 templates is calculated by analysing the configuration to 

14 identify common boundaries between pairs of object part 

15 templates. 
16 

17 Preferably, the spatial relationship of the object part 

18 templates is determined by identifying object parts 

19 having similar characteristics and defining these as a 
sub-set of the object part templates. 



Preferably, the calculating means is capable of 

23 calculating a link value for object parts which are 

24 physically connected. 
25 

26 Preferably, the calculating means is capable of 

27 iteratively combining the object parts in order to 
23 predict larger configurations of body parts. 



Preferably, the object is a human or animal body 



In accordance with a third aspect of the present 
33 invention there is provided, a computer program 



< 
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1 comprising program instructions for causing a computer to 

2 perform the method of the first aspect of the invention. 
3 

4 Preferably, the computer program is embodied on a 

5 computer readable medium. 
6 

7 In accordance with a fourth aspect of the present 

8 invention there is provided a carrier having thereon a 

9 computer program comprising computer implementable 

10 instructions for causing a computer to perform the method 

11 of the first aspect of the present invention. 
12 

13 In accordance with a fifth aspect of the present 

14 invention there is provided a markerless motion capture 

15 system comprising imaging means and a system for 

16 identifying an object or structured parts of an object in 

17 an image of the second aspect of the present invention. 
18 

19 The present invention will now be described by way of 

20 example only, with reference to the accompanying drawings 

21 ' in which: 
22 

23 Figures la is a flow diagram showing the operational 

24 steps used in implementing an embodiment of the present 

25 invention and Figure lb is a detailed flow diagram of the 

26 steps provided in the likelihood module of the present 

27 invention; 
28 

29 Figures 2a (i) to 2(viii) show a set of templates for a 

30 number of body parts and Figure 2b (i) to (iii) shows a 

31 reduced set of templates; 
32 
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1 Figure 3a shows a lower leg template, Figure 3b shows the 

2 lower leg template on an image and Figure 3c illustrates 

3 the feature distributions of the background and 

4 foreground regions of the image at or near the template; 
5 

6 Figure 4a is a graph comparing the probability density of 

7 foreground and background appearance for on and on ion 

8 meaning not on the part) part configurations for a head 

9 template and Figure 4b is a graph of the log of the 
10 resultant likelihood ratio; 

11 

12 Figure 5a is a column of typical images from both outdoor 

13 and indoor environments; Figure 5b is a column is a 

14 projection of the positive log likelihood from the masks 

15 or templates and Figure 5c is the projection of positive 

16 log likelihood from the prior art edge based model; 
17 

13 Figure 6a is a graph of the spatial variation of the 

19 learnt log likelihood ratios of the present invention and 

20 Figure 6b is a graph of the spatial variation of the 

21 learnt log likelihood ratios of the prior art edge model; 
22 

23 Figure 7a is a graph of the probability density for 

24 paired and non-paired configurations and Figure 7b is a 

25 plot of the log of the resulting likelihood ratio; 
26 

27 Figure 8a depicts an image of a body in an unconstrained 

28 background and Figure 8b illustrates the projection of 

29 the likelihood ratio for the paired response to a 

30 person's lower right leg image; and 
31 

32 Figures 9a to 9d show results from a search for partial 

33 pose configurations. 
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The present invention provides a method and system for 
identifying an object such as a body in an image. The 
technology used to achieve this result is typically a 
combination of computer hardware and software. 

Figure la shows a flow diagram of an embodiment of the 
present invention in which a still photograph of an 
unconstrained scene is analysed to identify the position 
of an object, in this example, a human body within the 
scene . 

Firstly, an image is created 3 using standard 
photographic techniques or using digital photography and 
the image is transferred 5 into a computer system adapted 
to operate the method according to the present invention. 
'Configuration prior 1 is data on the expected 
configuration of the body based upon known earlier body 
poses or known constraints on body pose such as the basic 
stance adopted by a person before taking a golf swing. 
This data can be used to assist with the overall analysis 
of body pose. 

A configuration hypothesis generator of a known type 
creates a configuration 10 created. The likelihood 
module 11 creates a score or likelihood 14 which is fed 
back to the configuration hypothesis generator 9. Pose 
hypotheses are created and a pose output is selected 
which is typically the best pose. 

Figure lb shows the operation of the likelihood generator 
in more detail. A geometry analysis module 14 is used to 
analyse the geometry of body parts by finding a mask. for 
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1 each part in the configuration and using the 

2 configuration to determine a transformation for each part 

3 from the part's mask to the image and then inverting this 

4 transformation. 



5 
6 
7 
3 
9 
10 
11 



13 
14 
15 
16 
17 

is 

19 
20 
21 

9 -? 



23 



24 

25 

26 

27 

28 

29 

30 

31 
32 

33 



An appearance builder module 16 is used to analyse the 
pixels in an image in the following manner. For every 
pixel in the image, the inverse transform is used to find 
the corresponding position on each part's mask and the 
probability from the mask is used to add the image 
features at that image location to the feature 



12 distributions 



An appearance evaluation module 18 is used to compare the 
foreground and background feature distributions for each 
part to get the single part likelihood. The foreground 
distributions are compared for each symmetric part to get 
the symmetry likelihood. The cues are combined to get the 
total likelihood. 



Details of the manner in which the above embodiment of 
the present invention is implemented will now be given 
with reference to figures 2 to 9. 

The shape of each of a number of body parts is modelled 
in the following manner. The body part, labelled here by 
i (i e 1...N) , is represented using a single probabilistic 
region template, Mi, which represents the uncertainty in 
the part's shape without attempting to enable shape 
instances to be accurately reconstructed. This approach 
allows for efficient sampling of the body part shape 
where the shape is obscured by a cover if, for example 
the subject is wearing loose fitting clothing. 
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The probability that a pixel in the image at position (x, 
y) belongs to a hypothesised body part i is given by 
M i (T i (:< r y)) where T t is a linear transformation from image 
co-ordinates to template or mask co-ordinates determined 
by the part's centre, (x c , y c ) , image plane rotation, 9, 
elongation, e, and scale, s. The elongation parameter 
alters the aspect ratio of the template and is used to 
approximate rotation in depth about one of the part's 



axes . 



The probabilities in the template are estimated from 
example shapes in the form of binary masks obtained by 
manual segmentation of training images in which the 
elongation is maximal (i.e. in which the major axis of 
the part is parallel to the image plane) . These training 
examples are aligned by specifying their centres, 
orientations and scales. Un-parameterised pose 
variations are marginalised over, allowing a reduction in 
the size of the state space. Specifically, rotation 
about each limb's major axis is marginalised since these 
rotations are difficult to observe. The templates can 
also be constrained to be symmetric about their minor 



axis . 



Figures 2a (i) to (viii) show templates with masks for 
human body parts. Figure 2a (i) is a mask of a head, 
Figure 2a (ii) is a mask of a torso, Figure 2a(iii) is a 
mask of an upper arm, Figure 2a (iv) is a mask of a lower 
arm, Figure 2a (v) is a mask of a hand, Figure 2a (vi) is a 
mask of an upper leg, Figure 2a(vii) is a mask of a lower 
leg and Figure 2a (viii) is a mask of a foot. 
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1 In this example, upper and lower arm and leg parts can 

2 reasonably be represented using a single template. This 

3 reduced number of masks greatly improves the sampling 

4 efficiency. 
5 

6 Figure 2b (i) to (iii) show some learnt probabilistic 

7 region templates. Figure 2b (i) shows a head mask, Figure 

8 2b (ii) shows a torso mask and figure 2b (iii) shows a leg 

9 mask used in this example. 
10 

11 The uncertain regions in these templates exist because of 

12 (i) 3D shape variation due to change of clothing and 

13 identity of the body, (ii) rotation in depth about the 

14 major axis, and (iii) inaccuracies in the alignment and 

15 manual segmentation of the training images. 
16 

17 In order to detect the body parts in an image, the 

18 dissimilarity between the appearance of the foreground 

19 and background of a transformed probabilistic region as 

20 illustrated in Fig. 3 is determined. These appearances 

21 are represented as Probability Density Functions (PDFs) 

22 of intensity and chromaticity image features, resulting 

23 in 3D probability distributions. 
24 

25 In general, local filter responses could also be used to 

26 represent the appearance. Since texture can often result 

27 in multi-modal distributions, each PDF is encoded as a 

28 histogram (marginalised over position) . For scenes in 

29 which the body parts appear small, semi-parametric 

30 density estimation methods such as Gaussian mixture 

31 models can be used. 
32 
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The foreground appearance histogram for part i, denoted 
here by F if is formed by adding image features from the 
part's supporting region proportional to Mi {T i (x,y))„ 
Similarly, the adjacent background, appearance 
distribution, B if is estimated by adding features 
proportional to 1 - Mi ( T t (x , y) ) . 

The foreground appearance will be less similar to the 
background appearance for configurations that are correct 
(denoted by on) than incorrect (denoted bv on ) . 
Therefore, a PDF of the Bhattacharya measure (for 
measuring the divergence of the probability density 

functions) given by Equation (1) is learnt for on and m 
configurations . 

The on distribution is estimated from data obtained by 
specifying the transformation parameters to align the 
probabilistic region template to be on parts that are 
neither occluded nor overlapping. The on distribution is 
estimated by generating random alignments elsewhere in 
sample images of outdoor and indoor scenes. 



The on PDF can be adequately represented by a Guassian 

distribution. Equation (2) defines SINGLEi as the ratio 

of the on and on distributions. This is used to score a 

single body part configuration and is plotted in Fig. 3. 

I(Fi,Bi; = Z -jF<f)*B.(f) (1) 

SINGLE* = p(I(Fi 9 Bt)\on) 

p(l(F,,B,)\on) 
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1 Figure 4a is a graph comparing the probability density of 

2 foreground and background appearance for on and on part 

3 configurations for a head template and Figure 4b is a 

4 graph of the log of the resultant likelihood ratio. 

5 It is clear from Figure 3a that the probability density 

6 distributions for the on and on distributions are well 

7 separated. 
8 

9 The present invention also provides enhanced 

10 discrimination of body parts by defining adjoining and 

11 non-adjoining regions. 
12 

13 Detection of single body parts, can be improved by 

14 distinguishing positions where the background appearance 

15 is most likely to differ from the foreground appearance. 

16 For example, due to the structure of clothing, when 

17 detecting an upper arm, adjoining background areas around 

18 the shoulder joint are often similar to the foreground 

19 appearance. The histogram model proposed thus far, which 

20 marginalises appearance over position, does not use this 

21 information optimally. 
22 

23 To enhance discrimination, two separate adjacent 

24 background histograms are constructed, one for adjoining 

25 regions and another for non-adjoining regions. In the 

26 model, it is expected that the non-adjoining region 

27 appearance will be less similar to the foreground 

28 appearance than the adjoining region appearance. 
29 

30 The adjoining and non-adjoining regions can be specified 

31 manually during training by defining a hard threshold. 

32 Alternatively, a probabilistic approach, where the 
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1 regions are estimated by marginalising over the relative 

2 pose between adjoining parts to get a low dimensional 

3 model could be used. 



4 



5 The use of information from adjoining regions is 

6 particularly useful where bottom-up identification of 

7 body parts is required. 



3 
9 
10 
11 
12 
13 
14 



Figures 5a to 5c show a set of images (Figure 5a) which 
have been analysed for part detection purposes using the 
present invention (Figure 5b) and by using a prior art 
method (Figure 4c). Figure 5a is a column of typical 
images from both outdoor and indoor environments, Figure 
5b is a column is a projection of the positive log 

15 likelihood from the masks or templates shewing the 

16 maximum likelihood of the presence of body parts and 
Figure 5c is the projection of positive log likelihood 
from the prior art edge based model. 



17 
13 
19 
20 
21 



The column Fig. 5b shows the projection of the likelihood 
ratio computed using Equation (2) onto typical images 
containing significant background information or clutter. 
The top image of Figure 5b shows the response for a head 

24 while the other two images show the response of a 

25 vertically-orientated limb filter. 



2 2 



23 



26 
27 
28 
29 
30 

31 
32 

33 



It can be seen that the technique of the present 
invention is highly discriminatory, producing relatively 
few false maxima in comparison with the prior art system. 
Although images were acquired using various cameras, some 
with noisy colour signals, system parameters were fixed 
for all test images. 
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In order to provide a comparison with an alternative 
method, the responses obtained by comparing the 
hypothesised part boundaries with edge responses were 
computed. These are shown in Fig. Sc. Orientations of 
significant edge responses for foreground and background 
configurations were learned (using derivatives of the 
probabilistic region template), treated as independent 
and normalised for scale. Contrast normalisation was not 
used. Other formulations (e.g. averaging) proved to be 
weaker on the scenes under consideration. The responses 
using this method are clearly less discriminatory. 

Figures 6a and 6b compare the spatial variation of the 
Log of Learnt likelihood ratios of the present invention 
and the prior art edge-based likelihood system for a 
head. In both Figures 6a and 6b, the correct position is 
centred and indicated by the vertical line 25. The 
horizontal bar 27 in both Figures 6a and 6b corresponds 
to a likelihood ratio of more than 1 which is the measure 
of whether an object is more likely to be a head than 
not. As can be seen from comparing Figures 6a and 6b, 
Figure 6b has a large number of positions where the 
likelihood is greater than 1, whereas only a single 
instance of this occurs in Figure 6a. 

The edge response, whilst indicative of the correct 
position of body parts, has significant false positive 
likelihood ratios. The part likelihood calculation used 
in the present invention is more expensive to compute, 
however, it is far more discriminatory and as a result, 
fewer samples are needed when performing pose search, 
leading to an overall computational performance benefit. 
Furthermore, the collected foreground histograms can be 
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useful for other likelihood measurements as described 
below. 

Since any single body part likelihood will probably 
result in false positives, the present invention provides 
for the encoding of higher order relationships between 
body parts, to improve discrimination. This is 
accomplished by encoding an expectation of structure in 
the foreground appearance and the spatial relationship of 
body parts. 

Configurations containing more than one body part can be 
represented using an extension of the probabilistic 
region approach described above. In order to account for 
self-occlusion, the pose space is represented by a depth 
ordered set, V, of probabilistic regions with parts 
sharing a common scale parameter, s. When taken 
together, the templates determine the probability that a 
particular image feature belongs to a particular part's 
foreground or background. More specifically, the 
probability that an image feature at position (x,y) 
belongs to the foreground appearance of part i is given 
by M L (Ti(x,y)) x n 3 (l - Mj { Tj (x,y) ) where j labels closer, 
instantiated parts. 

Therefore, a list of paired body parts is specified and 
the background appearance histogram is constructed from 
features weighted by n fc (l - M*(T k (x,y)) where k labels all 
instantiated parts other than i and those paired with i. 

Thus, a single image feature can contribute to the 
foreground and adjacent background appearance of several 
parts. When insufficient data is available to estimate 
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1 either the foreground or the adjacent background 

2 histogram (as determined using an area threshold) the 

3 corresponding likelihood ratio is set to one. 
4 

5 In order to define constraints between parts, a link is 

6 introduced between parts i and j if and only if they are 

7 physically connected neighbours. Each part has a set of 
S control points that link it to its neighbours. A link 

9 has an associated value LINK ir j given by 

( ' if &jfs < Aij 

LINKi *J = \ e i6i.jis-Ai.j)t<T ' . (3) 

v L otnerw ise 

10 

11 where b± f j is the" image distance between the control 

12 points of the pair, A irj is the maximum un-penalised 

13 distance and a relates to the strength of penalisation. 

14 If the neighbouring parts do not link directly, because 

15 intervening parts are not instantiated, the un-penalised 

16 distance is found by summing the un-penalised distances 

17 over the complete chain. This can be interpreted as 

18 being analogous to a force between parts equivalent to a 

19 telescopic rod with a spring on each end. 
20 

21 A simplifying feature of the system is that certain pairs 

22 of body parts can be expected to have a similar 

23 foreground appearance to one another. For example, a 

24 person's upper left arm will nearly always have a similar 

25 colour and texture to the person's upper right arm. In 

26 the system of the present invention, the limbs are paired 

27 with their opposing parts. To encode this knowledge, a 
23 PDF of the divergence measure (computed using Equation 

29 (1)) between the foreground appearance his toqrams of 

30 paired parts and non-paired parts is learnt. 
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1 

2 Equation (4) shows the resulting likelihood ratio and 

3 Figures 7a and 7b describe this ratio graphically. 

4 Figure 7a shows a plot of the learnt PDFs of the 

5 foreground appearance similarity for paired and non- 

6 paired configurations. The log of the resulting 

7 likelihood ratio is shown in Figure 7b. The higher 
probability of similarity is found for the paired 

9 configurations. 
10 

11 Figure 8 shows a typical image projection of this ratio 

12 and shows the technique to be highly discriminatory. It 

13 limits possible configurations if one limb can be found 

14 reliably and helps reduce the likelihood of incorrect 

15 large assemblies. 



p(I(Fi,Fj)\ 
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16 

17 Learning the likelihood ratios allows a principled fusion 

18 of the various cues and principled comparison of the 

19 various hypothesised configurations. The individual 

20 likelihood ratios are combined by treating the individual 

21 likelihood ratios as being independent of one another. 

22 The overall likelihood ratio is given by Equation (5) . 

23 This rewards correct higher dimensional configurations 

24 over correct lower dimensional ones. 

R = YlSJNGLE x Y[PAIR,., * JjLINK.j < 5 > 

25 

2 6 As is apparent from the above equation, the present 

27 invention enables different hypothesised configurations 

28 to have differing numbers of parts and yet allows a 
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comparison to be made between them in order to decide 
which (partial) configuration to infer given the image 
evidence , 

The parts in the inferred configuration may not be 
directly physically connected (e.g. the inferred 
configuration might consist of a lower leg, an arm and a 
head in a given scene either because the other parts are 
occluded or their boundaries are not readily apparent 
■ from the image) . 

An example of a sampling scheme useable with the present 
invention is described as follows. 

A coarse regular scan of the image for the head and limbs 
is made and these results are then locally optimised. 
Part configurations are sampled from the resulting 
distribution and combined to form larger configurations 
which are then optimised for a fixed period of time in 
the full dimensional pose space* 

Due to the flexibility of the parameterisation, a set of 
optimization methods such as genetic style combination, 
prediction, local search, re-ordering and re-labelling 
can be combined using a scheduling algorithm and a shared 
sample population to achieve rapid, robust, global, high 
dimensional pose estimation. 

Fig. 9 shows results of searching for partial pose 
configurations. The areas enclosed by the white lines 31, 
33, 3S r 37, 39, 41, 43, 45, 47 and 49 identify these pose 
configurations. Although inter-part links are not 

visualised in this example, these results represent 
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1 estimates of pose configurations with inter-part 

2 connectivity as opposed to independently detected parts. 

3 The scale of the model was fixed and the elongation 

4 parameter was constrained to be above 0.7. 
5 

6 The system of the present invention described above 

7 allows detailed, efficient estimation of human pose from 

8 real-world imaqes. 



10 The invention provides (i) a formulation that allows the 

11 representation and comparison of partial (lower 

12 dimensional) solutions and models other object occlusion 

13 and (ii) a highly discriminatory learnt likelihood based 

14 upon probabilistic regions that allows efficient body 

15 part detection. 
16 

17 The likelihood depends only on there being differences 

18 between a hypothesised part's foreground appearance and 

19 adjacent background appearance. The present invention 

20 does not make use of scene-specific background models and 

21 is, as such, general and applicable to unconstrained 

22 scenes. 
23 

24 The system can be used to locate and estimate the pose of 

25 a person in a single monocular image. In other examples, 

26 the present invention can be used during tracking of the 

27 person in a sequence of images by combining it with a 

28 temporal pose prior propagated from other images in the 

29 sequence. In this example, it allows tracking of the 
body parts to reinitialise after partial or full 

.31 occlusion or after tracking of certain body parts fails 

32 temporarily for some other reason. 

33 
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1 In a further embodiment, the present invention can be 

2 used in a multi-camera system to estimate the person's 

3 pose from several views captured simultaneously . 
4 

5 Many other applications follow from this ability to 

6 identify a body or structured parts of a body in an image 

7 (body pose information) . In one embodiment of the 

S present invention , the body pose information determined 

9 can be used as control inputs to drive a computer game or 

10 some other motion-driven or gesture-driven human-computer 

11 interface. 
12 

13 In another embodiment of the present invention, the body 

14 pose information can be used to control computer 

15 graphics, for example, an avatar. 
16 

17 In another embodiment of the present invention, 

18 information on the body pose of a person obtained from an 

19 image can be used in the context of an art installation 

20 or a museum installation to enable the installation to 

21 respond interactively to the person's body movements. 
22 

23 In another embodiment of the present invention, the 

24 detection and pose estimation of people in video images 

25 in particular can be used as part of automated monitoring 

26 and surveillance applications such as security or care of 

27 the elderly. 
28 

29 In another embodiment of the present invention, the 

30 system could be used as part of a markerless motion- 

31 capture system for use in animation for entertainment and 

32 gait analysis. In particular, it could be used to 

33 analyse golf swings or other sports actions. The system 
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1 could also be used to analyse image/video archives or as 

2 part of an image indexing system. 
3 

4 Some of the features of the invention can be modified or 

5 replaced by alternatives. For example, the use of 

6 histograms could be replaced by some other method of 

7 estimating a frequency distribution (e.g. mixture models, 

3 Parzen windows) or feature representation. Different 
9 methods for comparing feature representations could be 

10 used (e.g. chi-squared, histogram intersection) . 
11 

12 The part detectors could use other features (e.g. 

13 responses of local filters such as gradient filters, 

14 Gaussian derivatives or Gabor functions) . 
15 

16 The parts could be parameterised to model perspective 

17 projection. The search over configurations could 

18 incorporate any number of the widely known methods for 

19 high-dimensional search instead of or in combination with 

20 the methods mentioned above. 
21 
7 2 



The population-based search could use any number of 

3 heuristics to help bootstrap the search (e.g. background 

24 subtraction, skin colour or other prior appearance 

25 models, change/motion detection) . 

27 The system presented here is novel in several respects. 

28 The formulation allows .differing numbers of parts to be 

29 parameterised and allows poses of differing 

30 dimensionality to be compared in a principled manner 

31 based upon learnt likelihood ratios. In contrast with 

32 current approaches, this allows a part based search in 

33 the presence of self-occlusion. Furthermore, it provides 
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1 a principled automatic approach to other object 

2 occlusion. View based probabilistic models of body part 

3 shapes are learnt that represent intra and inter person 

4 variability (in contrast to rigid geometric primitives). 
5 

6 The probabilistic region template for each part is 

7 transformed into the image using the configuration 

5 hypothesis- The probabilistic region is also used to 
9 collect the appearance distributions for the part's 

10 foreground and adjacent background. Likelihood ratios 

11 for single parts are learnt from the dissimilarity of the 

12 foreground and adjacent background appearance 

13 distributions. This technique does not use restrictive 

14 foreground/background specific modelling. 
15 

16 The present invention describes better discrimination of 

17 body parts in real world images than contour to edge 

18 matching techniques. Furthermore, the use of likelihoods 

19 is less sparse and noisy, making coarse sampling and 

20 local search more effective. 
21 

22 Improvements and modifications may be incorporated herein 

23 without deviating from the scope of the invention. 
24 
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Claims 

1- A method of identifying an object or structured 
parts of an object in an image, the method comprising the 
steps, of : 

creating a set of templates, the set containing a 
template for each of a number of predetermined object 
parts and applying said template to an area of interest 
in an image where it is hypothesised that an object part 
is present; 

analysing image pixels in the area of interest to 
determine the probability that it contains the object 
part; 

applying other templates from the set of templates to 
other areas of interest in the image to determine the 
probability that said area of interest belongs to a 
corresponding object part and arranging the templates in 
a configuration; 

calculating the likelihood that the configuration 
represents an object or structured parts of an object; 
and 

calculating other configurations and comparing said 
configurations to determine the configuration that is 
most likely to represent an object or structured part of 
an object. 

2. A method as claimed in Claim 1 wherein, the 
probability that an area of interest contains an object 
part is calculated by calculating a transformation from 
the co-ordinates of a pixel, in the area of interest to 
the template . 
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1 3. A method as claimed in Claim 1 or Claim 2 wherein, 

2 analysing the area of interest further comprises 

3 identifying the dissimilarity betv/een foreground and 

4 background of a transformed probabilistic region. 
5 

6 4 . A method as claimed in any preceding claim wherein, 

7 analysing the area of interest further comprises 

5 calculating a likelihood ratio based on a determination 
9 of the dissimilarity between foreground and background 

10 features of a transformed template. 
11 

12 5. A method as claimed in any preceding claim wherein, 

13 the templates are applied by aligning their centres, 

14 orientations in 2D or 3D and scales to the area of 

15 interest on the image. 
16 

17 6. A method as claimed in any preceding Claim wherein 

18 the template is a probabilistic region mask in which 

19 values indicate a probability of finding a pixel 

20 corresponding to an object part. 
21 

•22 7. A method as claimed in any preceding claim wherein, 

23 the probabilistic region mask is estimated by 

24 segmentation of training images. 
25 

26 8. A method as claimed in any preceding claim wherein, 

27 the image is an unconstrained scene. 
28 

29 9. A method as claimed in any preceding claim wherein, 

30 the step of calculating the likelihood that the 

31 configuration represents an object or a structured part 

32 of ah object comprises calculating a likelihood ratio for 
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1 each object part and calculating the product of said 

2 likelihood ratios. 

3 

4 10. A method as claimed in any preceding claim wherein, 

5 the step of calculating the likelihood that the 

6 configuration represents an object comprises determining 

7 the spatial relationship of object part templates. 
S 

9 11. A method as claimed in Claim 10 wherein the step of 

10 determining the spatial relationship of the object part 

11 templates comprises analysing the configuration to 

12 identify common boundaries between pairs of object part 

13 templates. 
14 

15 12. A method as claimed in Claim 11 wherein the step of 

16 determining the spatial relationship of the object part 

17 templates requires identification of object parts having 

18 similar characteristics and defining these as a sub-set 

19 of the object part templates. 
20 

21 13- A method as claimed in any preceding claim, wherein 

22 the step of calculating the likelihood that the 

23 configuration represents an object or structured part of 

24 an object comprises calculating a link value for object 

25 parts which are physically connected. 
26 

27 14. A method as claimed in any preceding claim wherein 

28 the step of comparing said configurations comprises 

29 iteratively combining the object parts and predicting 

30 larger configurations of body parts. 
31 

32 15. A method as claimed in any preceding claim wherein 

33 the object is a human or animal body. 
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16. A system for identifying an object or structured 
parts of an object in an image, the system comprisinq: 
a set of templates, the set containing a template fox- 
each of a number of predetermined object parts 
applicable to an area of interest in an image where it is 
hypothesised that an object part is present; 

analysis means for determining the probability that the 
area of interest contains the object part; 
configuring means capable of arranging the applied 
templates in a configuration; 

calculating means to calculate the likelihood that the 
configuration represents an object or structured parts of 
an object for a plurality of configurations; and 
comparison means to compare configurations so as to 
determine the configuration that is most likely to 
represent an object or structured part of an object. 

17. A system as claimed in Claim 16 wherein, the system 
further comprises imaging means capable of providing an 
image for analysis. 

18. A system as claimed in claim 17 wherein the imaging 
means is a stills camera or a video camera. 

19. A system as claimed in Claims 16 to 18 wherein, the 
analysis means is provided with means for identifying the 
dissimilarity between foreground and background of a 
transformed probabilistic region. 

20. A system as claimed in Claims 16 to 19 wherein, the 
analysis means calculates the probability that an area of 
interest contains an object part by calculating a 



WO 2004/095373 PCT/GB2U04/00 1545 

30 

1 transformation from the co-ordinates of a pixel in the 

2 area of interest to the template. 
3 

4 21. A. method as claimed in any of Claims 16 to 20 

5 wherein, the analysis means calculates a likelihood ratio 

6 based on a determination of the dissimilarity between 

7 foreground and background features of a transformed 

■ 

S template. 
9 

10 22. A system as claimed in Claims 16 to 21 wherein, the 

11 templates are applied by aligning their centres, 

12 orientations (in 2D or 3D)-- and scales, to the area of 

13 interest on the image. 
14 

15 23. A system as claimed in any of Claims 16 to 22 

16 wherein the template is a probabilistic region mask in 

17 which values indicate a probability of finding a pixel 

18 corresponding to the body part. 
19 

20 24. A system as claimed in any of Claims 16 to 22 

21 wherein, the probabilistic region mask is estimated by 

22 segmentation of training images. 
23 

24 25. A system as claimed in Claims 16 to 24 wherein, the 

25 image is an unconstrained scene. 
26 

27 26. A system as claimed in Claims 16 to 25 wherein, the 

23 calculating means calculates a likelihood ratio for each 

29 object part and calculating the product of said 

30 likelihood ratios. 
31 

32 27. A system as claimed in Claim 26 wherein, the 

33 likelihood that the configuration represents an object 
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1 comprises determining the spatial relationship of object 

2 part templates. 

3 - 

4 2S . A system as claimed in Claim 27 wherein the spatial 

5 relationship of the object part templates is calculated 

6 by analysing the configuration to identify common 

7 boundaries between pairs of object part templates. 
8 

9 29. A system as claimed in Claim 28 wherein the spatial 

10 relationship of the object part templates is determined 

11 by identifying object parts having similar 

12 characteristics and defining these as a sub-set of the 

13 object part templates. 
14 

15 30. A system as claimed in any preceding claim, wherein 

16 the calculating means is capable of calculating a link 

17 value for object parts which are physically connected. 
18 

19 32. A system as claimed in any of claims 16 to 31 

20 wherein the calculating means is capable of iteratively 

21 combining the object parts in order to predict larger 

22 configurations of body parts. 

23 

24 33. A method as claimed in Claims 16 to 32 wherein the 

25 object is a human or animal body. 
26 

27 34. A computer program comprising program instructions 

28 for causing a computer to perform the method of any of 

29 Claims 1 to 15. 
30 

31 35. A computer program as claimed in claim 34 wherein 

32 the computer program is embodied on a computer readable 

33 medium. 
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1 36- A carrier having thereon a computer program 

2 compris ing computer implement able instructions for 

3 causing a computer to perform the method of any of claims 

4 1 to 15. 
5 

6 37. A markerless motion capture system comprising 

7 imaging means and a system for identifying an obj ect or 

8 structured parts of an object in an image as claimed in 

9 any of Claims 16 to 33. 



10/553664 



WO 2004/095373 PCT/GB 2004/00 1545 










CD 


CO 


CD 






03 


£ 



SUBSTITUTE SHEET (RULE 26) 



1 0/553 




SUBSTITUTE SHEET (RULE 26) 



10/553664 

PCT/CB2CMU/00I545 




SUBSTITUTE SHEET (RULE 26) 



4 



10/ 




SUBSTITUTE SHEET (RULE 26) 



10/ 



WO 20(14/095373 



PCT/CB2WM/IMII545 




5 / 



8 



0 0.1 0.2 0.3 0.4 0.5 0.9 0.7 OS 0.9 1 




y of hrsprnd Appmoce to 
Adjacent Bdctyrowd Appmmz 





Similarity of Foregwund Appeame to 

Adjacent BuClgwund Appedime 



580 

I ° 
^ -500 



J -wo\ 

M-tiOO 

§■2000 
■2500 




Position 



i=rdGa 




Position 




Gb 




0.1 
0 



0 



0.1 02 03 0.4 0.5 06 01 OS 03 1 

raffed Appcmuce Similarity 



5 

g 0 

" -5 
-10H 

^ -15 

•10 ■ 

-25- 
-30 



!) 0.1 02 0.3 OA 0. 






\7 OS 0.9 



Pa/red hppearance Similarity 

Find. Jh 



SUBSTITUTE SHEET (RULE 26) 



10/553664 



WO 20114/1195373 



PCT/GB20IM/0UI545 



8 











. h / 


• 






I'M 










4 










i » ■ r • • 



— * - w am» m m mm •* '* - 

** ' * "* L -* " 

* ■** ■ It «*H » ft, •#> *S # » 

M ••*:* *..*• *••* 

"rJ - . -. : ..• IV." - 

tif « » 




...;«>;• . .. .. ... 





if 



*3k 



5^ 






SUBSTITUTE SHEET (RULE 26) 



WO 2WM/095J7J PCT/GB20(M/<MI545 




SUBSTITUTE SHEET (RULE 26) 




SUBSTITUTE SHEET (RULE 26) 



This Page is inserted by IFW Indexing and Scanning 
Operations and is not part of the Official Record 

BEST AVAILABLE IMAGES 



Defective images within this document are Eceurate representations of the 
original documents submitted by the applicant. 




Defects in the images include but are not limited to the items checked: 

N |^pLACK BORDERS 

IMAGE CUT OFF AT TOP, BOTTOM OR SIDES 

ADED TEXT OR DRAWING 
BLURED OR ILLEGIBLE TEXT OR DRAWING 
"^^SKEWED/SLANTED IMAGES 

□ COLORED OR BLACK AND WHITE PHOTOGRAPHS 

□ GRAY SCALE DOCUMENTS 

□ LINES OR MARKS ON ORIGINAL DOCUMENT 

□ REPERENCE(S) OR EXHIBIT(S) SUBMITTED ARE POOR QUALITY 

□ OTHER: 



IMAGES ARE BEST AVAILABLE COPY. 

As rescanning documents will not correct images 
problems checked, please do not report the 
problems to the IFW Image Problem Mailbox 



